Combining Joint Factor Analysis and iVectors for Robust Language Recognition
نویسندگان
چکیده
This paper presents a system to identify the spoken language in challenging audio material such as broadcast news shows. The audio material targeted by the system is characterized by a large range of background conditions (e.g. studio recordings vs. outdoor interviews) and a considerable amount of non-native speakers. The designed model-based language classifier automatically identifies intervals of Flemish (Belgian Dutch), English or French speech. The proposed system is iVector-based, but unlike the standard approach it does not model the Total Variability. Instead, it relies on the original Joint Factor Analysis recipe by modeling the different sources of variability separately. For each speaker a fixed-length low-dimensional feature vector is extracted which encodes the language variability and the other sources of variability separately. The language factors are then fed to a simple language classifier. When assessed on a self-composed dataset containing 9 hours of monolingual broadcast news, 9 hours of multilingual broadcast news and 10 hours of documentaries, this classifier is found to outperform a state-of-the-art eigenchannel compensated discriminativelytrained GMM system by up to 20% relative. A standard iVector baseline is outperformed by up to 40% relative.
منابع مشابه
Language Recognition in iVectors Space
The concept of so called iVectors, where each utterance is represented by fixed-length low-dimensional feature vector, has recently become very successfully in speaker verification. In this work, we apply the same idea in the context of Language Recognition (LR). To recognize language in the iVector space, we experiment with three different linear classifiers: one based on a generative model, w...
متن کاملNoise robust speaker verification with delta cepstrum normalization
This paper introduces a delta cepstrum normalization (DCN) technique for speaker verification under noisy conditions. Cepstral feature normalization techniques are widely used to mitigate spectral variations caused by various types of noise; however, little attention has been paid to normalizing delta features. A DCN technique that normalizes not only base features but also delta-features was r...
متن کاملThe Use of Robust Factor Analysis of Compositional Geochemical Data for the Recognition of the Target Area in Khusf 1:100000 Sheet, South Khorasan, Iran
The closed nature of geochemical data has been proven in many studies. Compositional data have special properties that mean that standard statistical methods cannot be used to analyse them. These data imply a particular geometry called Aitchison geometry in the simplex space. For analysis, the dataset must first be opened by the various transformations provided. One of the most popular of the a...
متن کاملCosine Similarity Scoring without Score Normalization Techniques
In recent work [1], a simplified and highly effective approach to speaker recognition based on the cosine similarity between lowdimensional vectors, termed ivectors, defined in a total variability space was introduced. The total variability space representation is motivated by the popular Joint Factor Analysis (JFA) approach, but does not require the complication of estimating separate speaker ...
متن کاملتشخیص دستنوشتۀ برخط فارسی با استفاده از مدل زبانی و کاهش قوانین نگارش کاربر
The Joint-up, cursive form of Persian words and immense variety of its scripts, also different figures of Persian letters depending on their sitting positions in the words, have turned the Persian handwritings recognition to an intense challenge. The major obstacle of the most often recognition ways, is their inattention to sentence contexture which causes utilizing of a word with correct appea...
متن کامل